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1  Preface 


This  report  describes  our  research  activities  on  Contract  F49620-89-C-0126  for  the  period 
September  15, 1989  through  December  31, 1990,  and  is  the  “Final  Technical  Report.”  This  Final 
Technical  Report  presents  an  overview  of  the  work  of  the  entire  period  of  the  contract.  Our  basic 
approach  to  detecting  and  tracking  motion  is  to  extract  and  match  features,  such  as  lines  and  re¬ 
gions,  from  a  sequence  and  to  generate  motion  estimates  from  these  correspondences. 
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2  Introduction  and  Summary 

This  research  project  has  addressed  motion  analysis  and  its  applications  with  the  study  of 
techniques  to  detect,  track,  and  predict  the  motion  of  moving  objects  from  a  moving  platform.  For 
this  project,  the  major  goal  was  the  development  of  techniques  for  describing  a  three-dimensional 
environment  using  a  sequence  of  images  from  a  mobile  robot.  This  goal  was  attacked  by  several 
separate  research  projects  in  motion  analysis,  which  also  used  general  image  analysis  techniques 
from  our  other  (past  or  current)  research  projects.  This  report  will  describe  the  overall  direction  of 
our  motion  analysis  research  with  descriptions  of  our  major  recent  results.  This  report  represents 
the  developments  of  the  September  15, 1989  to  December  14, 1990  period. 

Two  basic  approaches  to  motion  analyse  are  the  short  range  (optical  flow)  and  long  range 
(feature  point)  methods.  Observations  by  psychologists,  that  the  relative  motion  or  flow  of  scene 
points  as  projected  on  the  retina  can  determine  .he  relative  depth  of  objects,  led  computer  vision 
researchers  to  the  iaea  of  optical  flow.  Optical  flow  computations  are  limited  to  small  motions  be¬ 
tween  views  and  have  proven  to  be  unstable  and  umeliabl"  in  the  general  (real  image)  case.  These 
methods  are  appealing  in  a  mobile  robot  task  for  computing  the  vehicle  motion  since  global  tech¬ 
niques  tend  to  reduce  errors,  but  these  methods  have  not  been  able  to  overcome  their  basic  compu¬ 
tational  problems  in  this  case. 

The  other  methods,  called  feature  point  or  long  range  techniques,  attempt  to  compute  many 
of  the  same  properties  as  optical  flow  methods  using  far  fewer  points  in  each  image.  These  use  a 
small  set  of  corresponding  points  from  the  image  sequence  to  compute  the  three-dimensional  mo¬ 
tion  and  structure.  Different  methods  require  different  numbers  of  points  in  various  numbers  of 
frames  under  different  assumptions.  Generally,  a  set  of  equations,  which  encapsulate  the  con¬ 
straints  imposed  by  the  assumptions  (rigidity,  small  motions,  etc.),  are  solved  to  derive  the  three- 
dimensional  motion  parameters.  These  formulations  are  sensitive  to  noise  in  the  input  data  and  pro¬ 
duce  unstable  results,  especially  when  only  two  views  of  the  scene  are  used.  In  order  to  capture 
the  important  constraints  imposed  by  an  extended  sequence  of  views,  we  developed  a  technique  to 
estimate  the  motion  parameters  using  five  frames  for  general  motion  and  three  frames  for  transla¬ 
tional  motion  [Sharit86j.  We  continue  to  use  this  underlying  motion  computation  and  to  develop 
multi-frame  techniques. 

Motion  analysis  using  feature  point  analysis  techniques  and  multiple  frames  forms  the  cen¬ 
tral  focus  of  our  work.  This  approach  involves  extracting  a  set  of  consistent  features  from  a  se¬ 
quence  of  images,  finding  the  corresponding  features  in  consecutive  frames,  and  finally  computing 
the  three-dimensional  motion  based  on  the  correspondences,  which  also  provides  an  estimate  of 
the  structure  of  the  moving  objects  or  scene.  These  are  often  described  separately  or  as  sequential 
operations,  but  integration  into  a  single  system  and  feedback  to  earlier  processing  is  a  major  part 
of  the  work. 

Our  effort  includes  several  separate  and  related  projects  including:  analysis  of  closely 
spaced  images  (spatio-temporal  analysis)  using  features  such  as  lines,  corners,  and  regions  to  ex- 
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tract  three-dimensional  structure  information;  matching  edge  based  contours  in  a  sequence  of  im¬ 
ages;  integrating  several  feature  detection  and  matching  techniques  to  derive  three-dimensional 
motion  and  structure  estimates;  study  of  the  formulation  of  the  motion  estimation  problem;  detec¬ 
tion  of  moving  objects  in  a  scene  with  a  moving  observer;  and  the  visual  guidance  of  a  mobile  ro¬ 
bot. 

This  report  presents  the  results  in  the  major  research  components  of  our  motion  analysis 
work.  The  results  of  this  research  have  also  appeared  in  other  conferences  and  workshops.  The 
work  described  in  the  report  and  the  writing  of  the  report  represent  the  efforts  of  several  researchers 
in  our  group.  The  feature  matching  work  was  performed  by  Salit  Gazit  with  Gerard  Medioni.  The 
spatio-temporal  analysis  is  by  Shou-Ling  Peng  with  Gerard  Medioni.  The  motion  system  devel¬ 
opment  was  done  by  Yong  Kim  and  Keith  Price,  the  motion  estimation  work  was  done  by  Wolf¬ 
gang  Franzen,  and  the  mobile  robot  work  has  been  primarily  by  Jean-Yves  Cartoux. 

2.1  SPATIO-Tfc- V1PORAL  ANALYSIS 

The  goal  of  our  work  in  spatio-temporal  analysis  is  to  generate  a  dense  optic  flow  map  from 
a  motion  sequence.  Because  of  the  sparseness  of  OD  features  (comers)  or  ID  features  (curves),  we 
feel  2D  features  (regions)  are  more  likely  to  produce  dense  motion  estimates. 

Early  work  in  spatio-temporal  analysis  includes  that  of  [Bolles87]  and  depended  on  know¬ 
ing  the  camera  motion  and  restricted  this  motion  to  simple  translations  These  techniques  use  the 
close  spacing  to  simplify  the  computation  of  correspondences  between  frames  -  the  corresponding 
feature  is  the  closest  one  in  the  next  frame. 

The  basis  for  analysis  is  matching  image  features  along  slices  cut  through  the  time-image 
volume  of  data.  Assuming  the  motion  can  be  approximated  by  piecewise  translational  motion 
along  the  camera  axis,  and  the  focus  of  expansion  (FOE)  position  is  given,  the  motion  direction  of 
each  image  element  corresponding  to  a  stationary  object  is  given.  If  slices  are  cut  at  the  FOE  along 
the  direction  radiating  from  the  FOE,  the  match  disparities  are  then  the  magnitude  of  the  image 
plane  velocity,  giving  a  dense  o£tic  flow  map. 

Since  the  spatio-temporal  images  are  registered  by  a  rectangular  coordinate  system,  cutting 
radial  slices  is  equivalent  to  transforming  the  images  into  a  polar  coordinate  system.  This  causes 
resolution  problems:  if  the  slices  through  the  sequence  are  dense  enough  so  that  pixels  far  from  the 
FOE  are  sampled  by  one  slice,  then  pixels  close  to  the  FOE  are  included  in  many  slices. 

If  we  examine  the  slices  more  carefully,  some  pairs  of  paths  serve  as  the  non-parallel  sides 
of  trapezoidal  regions.  Each  such  region  corresponds  to  a  collection  of  chords  of  a  moving  object 
seen  in  each  image  in  the  sequence.  If  we  assume  that  the  velocity  changes  smoothly  between  two 
paths  (i.e.  between  two  points  on  an  object),  we  generate  flow  values  for  all  pixels  in  the  region  by 
interpolation. 

Assuming  the  motion  in  the  scene  is  approximated  by  piecewise  translational  motion  along 
the  axis  of  the  camera,  and  the  focus  of  expansion  (FOE)  position  is  given,  the  optical  flow  direc- 
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First  Frame  Velocity  Needle  Diagram 

Figure  1.  SRI  Sequence:  Hallway 


First  Frame  Velocity  Needle  Diagram 

Figure  2.  SRI  Sequence:  Zoom 


tion  of  each  image  element  can  be  determined.  If  slices  are  cut  at  the  FOE  alorig  the  direction  ra¬ 
diating  from  the  FOE  and  image  points  are  matched  in  the  slice,  the  match  disparities  are  then  the 
magnitude  of  the  velocity.  This,  when  combined  with  the  interpolation,  produces  a  dense  optic 
flow  map. 

We  devised  a  parallel  algorithm  to  approximate  the  complete  radial  slicing,  which  simpli¬ 
fies  this  data  access  problem.  We  only  take  slices  at  each  pixel  along  the  four  directions:  horizon¬ 
tal,  vertical,  45°,  and  -45°.  Using  the  interpolation  step  mentioned  above,  each  pixel  would  have  at 
most  four  estimates  of  the  velocity  components  along  different  directions.  Using  the  method  pre¬ 
sented  earlier  in  [Peng8.8,Peng89],  the  normal  velocity  of  the  pixel  is  recovered.  .With  both  the  mo¬ 
tion  direction  (from  the  FOE  position)  and  normal  velocity  (from  the  slice  analysis),  we  are  able  to 
compute  the  velocity  of  the  pixel.  From  our  experiments,  the  results  from  both  approached  are  very 
similar.  Results  of  using  this  technique  are  shown  in  Figure  1 }  and  2  showing  one  frame  of  a  se¬ 
quence,  the  computed  velocity  for  the  optical  flow  and  the  typical  optical  flow  direction  diagram. 

2.2  FEATURE-BASED  MOTION  CORRESPONDENCE 

Feature  matching  is  a  major  component  of  any  feature  based  motion  system.  We  have  de¬ 
veloped  and  used  several  different  general  feature  matching  methods  in  the  past.  Under  this  re- 
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search  program  we  are  developing  a  contour  based  matching  approach  fhat  uses  large  scale  match¬ 
es  to  guide  the  finding  of  detailed  edge  element  matches  for  images  with  spacings  greater  than  the 
spatio-temporal  approaches.  This  approach  begins  by  finding  matches  for  line  segment  approxi¬ 
mations  of  the  edge  contours  in  the  images.  Then  portions  of  the  contour  are  matched  using  the 
approximate  matches  given  by  the  segment  matches  to  limit  the  final  edge  element  to  edge  element 
matches.  The  set  of  pair-wise  matches  are  combined  to  generate  traces  of  matching  edge  elements 
through  the  entire  sequence.  Matches  through  parts  of  the  sequence  are  also  maintained. 

This  contour-based  matching  technique  is  derived  from  our  earlier  work  [Gazit88,Gazit89], 
The  major  improvements  are  that  multiple  frame  matches  need  not  extend  through  the  entire  se¬ 
quence  of  images,  thus  allowing  for  occlusion  and  (reappearance  of  points  midway  through  the 
sequence;  and  the  use  of  neighborhood  and  length  to  distinguish  between  correct  and  incorrect 
matches.  These  changes  have  resulted  in  a  significant  improvement  in  the  quality  of  the  matches. 

A  brief  description  of  the  contour  matching  method  is:  A  super-segment  is  an  object  de¬ 
scribed  both  as  a  list  of  connected  edgels  and  a  list  of  connected  line-segments  (that  approxin  ,fe 
the  edgel  contour).  The  algorithm  tries  to  match  sections  of  super-segments.  Since  a  single  object 
may  correspond  to  several  different  super-segments  and  a  single  super-segment  may  include  more 
than  one  object,  the  problem  is  to  identify  the  matching  super-segment  sections .  We  base  our  ini¬ 
tial  matching  criterion  only  on  shape  similarity  and  proximity  (with  a  maximum  allowable  dispar¬ 
ity).  An  initial  approximation  is  found  by  first  matching  the  line-segments  and  combining  matches 
along  each  super-segment.  Next  we  compute  the  section  matches  themselves.  In  order  to  find  ap¬ 
propriate  matching  sections,  we  break  the  line-segment  approximations  used  in  the  previous  stage 
into  arbitrary  small  sections  and  match  them  (along  the  possibly  matching  section)  by  maximizing 
the  similarity  between  the  matching  sections  as  well  as  the  length  (in  points)  of  the  matching  sec¬ 
tions.  The  result  is  a  very  large  set  of  matches,  the  great  majority  of  which  are  spurious.  The  main 
thrust  of  the  work  is  in  how  to  deal  with  these  spurious  matches.  ' 

Our  solution  to  distinguish  between  incorrect  and  correct  matches  is  based  on  the  assump¬ 
tion  that  correct  matches  will  usually  either  be  long  or  will  have  approving  neighbors ,  which  are 
neighbor  matches  representing  a  similar  motion.  The  neighborhood  size  should  ideally  depend  on 
object  size,  but  since  this  step  comes  before  object  segmentation,  we  instead  use  a  fixed  fraction  of 
the  image  size.  Each  match  is  assigned  a  approving  length  score  which  is  a  combination  of  the 
total  length  of  supporting  neighboring  matches  and  their  number;  a  n on-approving -length  score 
computed  from  the  non-supporting  neighbor  matches;  and  a  shape  similarity  score.  Using  these 
three  measures  together  allows  us  to  detect  incorrect  matches  in  roost  of  the  cases,  since  they  are 
short,  have  little  neighborhood  support  and  a  lot  of  neighborhood  rejection,  as  they  represent  an 
inconsistent  motion.  A  notable  exception  to  this  occurs  with  straight  line  contours,  which  are  easy 
to  detect  and  for  which  we  have  a  partial  solution,  and  repetitive  structures. 

We  apply  this  algorithm  hierarchically  for  different  scales  for  better  performance 
[Gazit89].  We  also  combine  pairwise  matches  into  multiple-frame  matches  by  a  tracking  the 
matching  sections  through  the  sequence.  If  section  Pj  in  frame  1  matches  section  P2  in  frame  2, 
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Multiple  Matches  of  the  Jeep  Pile  Multiple  Matches  of  the  Train 
Figure  3.  Jeep  and  train  image  -  Multiple  Matches 

Q2  in  frame  2  matches  Q3  in  frame  3  and  P2  overlap  Q2,  we  can  compute  a  new  match  (I*  P?.,R3) 
corresponding  to  the  overlapping  part.  This  is  applied  to  all  frames  in  the  image  sequence  fhe 
resulting  section  matches  can  be  used  for  3D  motion  estimation  or  motion  segmentation. 

As  an  example,  a  sequence  consisting  of  10  250  x  5 12  frames  of  a  toy  jeep  and  train  is  given 
ia  figure  1.  We  only  show  the  last  two  frames  and  the  resulting  multiple  matches.  Because  the  mo¬ 
tion  of  the  objects  overlap,  and  also  to  allow  for  better  visibility,  we  manually  removed  the  back¬ 
ground  matches  and  separated  the  matches  corresponding  to  the  train  and  those  corresponding  to 
the  jeep. 

In  this  scene  the  camera  is  stationary,  but  both  the  jeep  and  the  train  are  moving.  This  is  a 
difficult  scene  as  the  motion  is  very  large  (disparity  ranges  from  0  to  150  pixels)  and  the  scene  con¬ 
tains  occlusion. 

2.3  DETECTING  MOVING  OBJECTS  FROM  A  MOVING  PLATFORM 

Detecting  moving  objects  from  a  moving  platform  is  a  difficult  problem,  because  the  ob¬ 
server  motion  causes  stationary  objects  to  appear  to  move.  Thus,  we  must  separate  genuine  motion 
from  the  apparent  motion  of  the  stationary  environment.  We  have  developed  a  system  that  success¬ 
fully  detects  moving  objects  within  a  sequence  of  real  images  taken  from  an  observation  vehicle 
traveling  along  a  road.  Unlike  other  systems,  ours  does  not  require  densely-sampled  imagery, 
meaning  that  objects  can  move  many  pixels  per  frame  with  no  detrimental  effects.  Nor  does  the 
system  rely  on  critical  parameter  settings.  It  is  computationally  efficient,  and  highly  suited  to  par¬ 
allel  implementation.  It  requires  no  object  matching  or  recognition,  and  can  thus  detect  moving  ob¬ 
jects  that  are  partially  occluded  or  that  are  camouflaged. 


When  an  observer  moves  in  a  straight  line,  toward  a  distant  point  in  space,  stationary  ob¬ 
jects  in  the  environment  appear  to  move  along  paths  radiating  from  that  point.  The  point  from 
which  the  paths  radiate  is  called  the  focus  of  expansion  (FOE).  We  assume  that  the  FOE,  and  cam¬ 
era  orientation,  are  relatively  stable  between  successive  images  (i.e.,  the  observer  must  not  sharply 
turn  or  tilt  between  images). 

To  simplify  the  problem,  we  first  perform  a  Complex  Logarithmic  Mapping  (CLM)  as  sug¬ 
gested  by  [Cavanaugh78,Weiman79,Jain84].  This  converts  the  problem  from  one  of  detecting  a 
complex  motion  along  both  the  X  and  Y  axes,  to  one  of  detecting  motion  along  an  angular  axis, 
with  stationary  objects  moving  along  the  other  axis. 

To  detect  the  angular  motion  in  CLM  space,  we  have  developed  a  novel  “moving-edge  de¬ 
tector,”  which  operates  on  successive  images  and  produces  a  map  containing  all  pixels  on  the  edge 
regions  that  are  moving  relative  to  the  stationary  background.  This  map  is  then  thresholded  to 
pi  x.uce  detected  movement.  Preliminary  results  indicate  that,  once  the  threshold  is  raised  high 
enough  to  eliminate  false  alarms,  it  can  be  increased  by  a  factor  of  five  and  still  properly  detect 
moving  objects. 

The  resulting  detected  movement  is  then  transformed  back  into  the  rectangular  reference 
frame,  and  overlaid  upon  the  original  image  to  highlight  the  detec  ed  objects.  The  results  are  pre¬ 
sented  in  more  detail  in  [Frazier90].  This  technique  depends  heavily  on  the  correct  computation 
of  the  FOE  and  very  loosely  on  the  movement  threshold  value. 

2.4  MOTION  ESTIMATION 

We  have  developed  a  solution  for  the  multiframe  structure  from  motion  problem  using  fea¬ 
ture  matches.  This  work  assumes  a  central  projection  pinhole  camera  with  no  smoothness  assump¬ 
tions  imposed  concerning  object  surfaces.  The  use  of  multiple  (as  opposed  to  two)  frames  is  desir¬ 
able  for  several  reasons: 

•  to  increase  the  robustness  of  the  solution, 

•  to  allow  recovery  of  structure/motion  with  fewer  features  being  tracked,  and 

•  to  allow  estimation  of  “higher  order  derivatives”  of  the  motion. 

In  this  work,  we  have  developed  and  implemented  two  algorithms  to  solve  the  SFM  prob¬ 
lem  that  make  different  assumptions  concerning  smoothness  and  type  of  motion.  The  first  is  a 
closed  form  algorithm  that  models  the  relative  motion  between  the  camera  and  the  object  or  envi¬ 
ronment  as  a  uniform  3D  acceleration.  The  second  is  an  iterative  algorithm  that  can  recover  arbi¬ 
trary  rigid  transformations  between  frames.  The  closed  form  algorithm  is  currently  being  used  to 
generate  initial  guesses  for  the  iterative  algorithm  when  the  rotation  is  known  to  be  small. 

The  two  algorithms  share  some  characteristics:  Both  assume  that  features  are  matched 
through  at  least  three  frames.  The  image  plane  position  of  each  feature  is  modeled  as  having  a  bi¬ 
variate  Gaussian  error  distribution,  with  the  error  coefficients  provided  as  input.  Although  the  al¬ 
gorithms  are  developed  using  point  features,  they  can  process  both  point  and  line  features.  A  given 
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feature  is  not  required  to  be  visible  in  every  frame,  so  the  algorithms  can  process  features  that  be¬ 
come  (un)occluded  during  the  image  sequence.  As  output,  both  algorithms  generate  the  3D  loca¬ 
tion  of  each  feature  in  each  frame,  along  with  the  motion  parameters. 

The  closed  form  algorithm  models  the  motion  as  a  uniform  3D  acceleration.  It  minimizes 
a  norm  that  is  closely  related  to  the  maximum-likelihood  image  plane  error  norm  subject  to  the  con¬ 
straint  that  the  mean  interframe  displacement  must  equal  one.  Under  this  formulation,  the  3D  point 
positions  are  linear  functions  of  the  motion  parameters,  and  the  motion  parameters  can  be  deter¬ 
mined  by  solving  a  small  eigenvalue  problem.  The  computational  complexity  of  the  algorithm  is 
linear  in  the  number  of  features  being  tracked  times  the  number  of  frames. 

The  iterative  algorithm  solves  the  SFM  problem  as  an  unconstrained  minimization  prob¬ 
lem.  The  function  to  be  minimized  consists  of  3  (classes  of)  terms: 

1.  the  image  plane  error  or  a  more  or  less  convex  approximation  to  it, 

2.  terms  which  bias  the  motion  to  be  cnronogeneous  or  some  subclass  of  chronogeneous 
motion,  and 

3.  a  term  which  imposes  a  specific  scale  on  the  solution. 

Minor  changes  in  the  form  of  a  term  may  dramatically  alter  the  convergence  properties  of 
the  algorithm.  The  algorithm  is  currently  very'  slow  because  analytic  derivatives  have  not  been  pro¬ 
grammed,  and  a  quasi-Newton  method  with  a  finite  difference  gradient  is  being  used  to  do  the  op¬ 
timization. 

2.5  INTEGRATED  SYSTEM  FOR  MOTION 

We  have  developed  an  integrated  system  for  testing  each  of  the  subsystems  of  the  motion 
analysis  system  (segmentation,  feature  extraction  and  matching,  motion  estimation,  motion  feed¬ 
back  to  matching  and  coordination).  The  results  of  each  subsystem  is  saved  in  a  single  data  struc¬ 
ture  and  the  coordination  module  controls  exchange  of  information  between  subsystems.  The  in¬ 
tegrated  system  is  now  being  used  to  generate  a  rough  description  of  three-dimensional  structure 
of  the  environment,  using  region-based  matches  refined  by  comer  matches  over  multiple  frames. 
This  work  is  described  in  more  detail  in  these  proceedings  in  [Kim90]. 

Feature  matching  is  done  in  a  coarse-to-fine  manner  to  reduce  search  space  and  enhance 
stability.  Comer-based  matching  for  a  region  is  guided  by  the  motion  computed  for  the  centers  of 
mass  of  the  matched  regions  and  by  the  constraint  that  matching  comers  are  on  the  same  region. 
This  allows  large  disparities  between  images ' ^d  different  motions  for  each  of  the  regions.  Comer- 
based  matching  is  performed  both  in  the  forward  and  reverse  directions  to  decrease  errors  in  match¬ 
ing. 

We  have  developed  a  translation  dominant  motion  analysis  system  as  an  additional  feature 
of  the  general  motion  analysis  system.  The  basic  assumptions  are  that  each  object  in  the  scene  is 
undergoing  a  translation  dominant  motion  and  that  an  object  may  (or  may  not)  be  in  coherent  mo¬ 
tion  with  some  of  the  others.  An  approximate  FOE  (focus  of  expansion)  using  a  LMSE  (least  mean 
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square  error)  estimation  and  motion  parameters  are  estimated  for  each  region  and  then  depth  is 
computed  for  the  corners  of  the  region.  Each  computed  result  is  associated  with  a  reliability  factor, 
which  is  a  measure  of  the  closeness  to  the  computed  motion  to  a  translational  motion.  Regions  with 
a  high  reliability  are  given  high  priorities  in  the  analysis  and  their  results  act  as  a  guide  in  the  anal¬ 
ysis  of  the  less  reliable  regions  by  giving  some  constraints  to  the  motion  parameters. 

This  motion  analysis  system  was  tested  for  two  real  image  sequences.  A  camera  is  moving 
straight  along  a  hallway  in  one  of  them,  and  in  the  other  sequence,  a  car  is  moving  from  the  right 
side  of  the  image  to  the  other  end.  With  a  reasonable  amount  of  noise,  we  could  obtain  an  approx¬ 
imate  environmental  depth  map  for  most  of  the  important  regions  in  the  scene.  Depth  maps  with 
region-comer  matches  are  shown  in  [Kim90]. 

Experiments  show  some  weak  points  for  this  system.  First,  the  use  of  the  FOE  analysis  for 
general  motion  (translation  +  rotation)  is  sensitive  to  noise  and  thus  the  computed  motion  param¬ 
eters  are  numerically  unstable.  In  the  case  of  translation  dominant  motion,  an  accurate  estimate  i 
«/  FOE  is  essential  for  reliable  results.  Second,  information  of  depth  is  lost  along  a  smooth  bound¬ 
ary  even  when  it  forms  a  great  part  of  a  region  since  fine  structure  is  determined  by  comer  matches. 

We  continue  to  add  more  features  to  our  integrated  system.  Primarily,  we  plan  to  add  more 
feedback  links  within  the  system  so  that  an  erroneous  matches  at  an  early  stage  is  detected  and  cor¬ 
rected  by  results  of  later  stages.  This  way,  motion  analysis  is  done  as  a  part  of  a  cooperative  pro¬ 
cess  rather  than  an  isolated  stage  of  a  sequential  process. 

2.6  MOBILE  PLATFORM 

'  We  acquired  a  Denning  mobile  robot  for  both  indoor  and  outdoor  experimentation.  The 
initial  phase  of  experimentation  dealt  with  basic  control  and  navigation  issues  but  the  goals  include 
visual  feature  navigation  and  a  platform  for  testing  our  other  motion  algorithms.  We  do  not  intend 
to  concentrate  on  real-time  (high  speed)  control,  which  would  only  be  possible  with  additional  spe¬ 
cial  purpose  computers,  but  to  develop  high-performfmee  analysis  algorithms.  This  initial  effort 
has  produced: 

•  an  obstacle  avoidance  routine  using  the  range  data  provided  by  the  24  ultrasonic  sensors 
of  the  robot,  and 

.  •  a  simple  planner  allowing  the  robot  to  navigate  indoors. 

An  obstacle  in  front  of  or  on  the  sides  of  the  robot  is  detected  by  checking  the  ultrasonic 
sensors  in  near  the  direction  of  motion.  If  there  is  an  obstacle,  the  robot  turns  toward  the  direction 
of  the  first  sensor  where  the  path  is  clear.  This  is  intended  as  a  low-level  survival  process  rather 
than  a  major  navigational  tool. 

The  map  of  the  robot  world  is  represented  by  a  hierarchical  data  structure  that  includes 
buildings  which  are  defined  by  a  set  of  floors.  Each  floor  has  hallways,  a  set  of  rooms  and  a  set  of 
walls.  Each  wall  may  include  doors. 
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In  the  first  phase,  the  robot  is  assigned  to  navigate  in  the  hallway  of  a  floor.  The  ideal  tra¬ 
jectory  is  the  mid-line  between  the  two  walls  of  the  hallway.  The  planner  first  computes  a  list  of 
the  axis  of  symmetry  of  each  hallway  path.  Each  axis  is  limited  to  the  common  part  of  the  pair  of 
walls,  must  be  inside  the  external  polygon  of  the  hallway,  but  not  inside  any  of  its  internal  poly¬ 
gons.  A  merging  step  produces  the  axes  of  the  corridors. 

From  the  extremes  and  the  intersections  points  of  the>>e  axes,  a  graph  of  trajectory  control 
points  is  constructed.  The  path  of  the  robot,  shown  by  thick  black  circles  and  lines  on  the  figure, 
from  its  current  location  toward  a  goal  door  is  then  computed  from  the  graph  representation.  Final¬ 
ly,  the  list  of  path  control  points  is  given  to  the  navigation  routine  that  orients  the  robot  toward  the 
next  path  control  point,  unless  the  robot  is  bypassing  an  obstacle. 
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